Skip to content

fix: kill ghost agent processes on Ctrl+C#27

Closed
malpou wants to merge 7 commits into
computerlovetech:mainfrom
malpou:pm/ghost-agent-cleanup
Closed

fix: kill ghost agent processes on Ctrl+C#27
malpou wants to merge 7 commits into
computerlovetech:mainfrom
malpou:pm/ghost-agent-cleanup

Conversation

@malpou
Copy link
Copy Markdown
Contributor

@malpou malpou commented Mar 22, 2026

Summary

Fixes #26 — agent subprocesses now reliably terminate on Ctrl+C, timeout, and cancellation.

  • Add _kill_process_group helper that sends SIGTERM with 3s grace period before SIGKILL
  • Use start_new_session=True to isolate agent subprocesses in their own process group
  • Apply process group cleanup to both streaming and blocking agent modes
  • Guard os.killpg() to only fire when the child is actually a session leader

Test plan

  • All 447 existing tests pass
  • New tests verify process group isolation (start_new_session=True)
  • New tests verify cleanup kills the process group on cancellation/timeout
  • Tested on WSL2 (Linux 5.15)

malpou and others added 7 commits March 22, 2026 14:29
Add _kill_process_group() and _SESSION_KWARGS to support killing agent
processes and all their children when the ralph loop is cancelled or
times out. This is the foundation for fixing computerlovetech#26.

Co-authored-by: Ralphify <noreply@ralphify.co>
Use start_new_session=True in _run_agent_streaming so the agent and its
children form a separate process group. Replace proc.kill() calls with
_kill_process_group() for proper group-wide cleanup on timeout and in
the finally block. Also harden _kill_process_group with a pgid==pid
guard and SIGTERM-before-SIGKILL strategy for safer WSL behavior.

Co-authored-by: Ralphify <noreply@ralphify.co>
Use subprocess.Popen with start_new_session=True in the blocking path
so agent subprocesses form their own process group. On timeout or
KeyboardInterrupt, the entire group is killed via _kill_process_group,
preventing ghost processes from surviving Ctrl+C.

Co-authored-by: Ralphify <noreply@ralphify.co>
Cover _kill_process_group behavior (SIGTERM/SIGKILL escalation, session
leader guard, fallbacks) and verify both streaming and blocking modes
use start_new_session and call _kill_process_group on timeout/interrupt.

Co-authored-by: Ralphify <noreply@ralphify.co>
Remove redundant _kill_process_group integration tests that duplicated
unit test coverage, merge similar fallback tests, and trim helper
docstrings. Reduces test additions by ~100 lines without losing coverage.

Co-authored-by: Ralphify <noreply@ralphify.co>
Co-authored-by: Ralphify <noreply@ralphify.co>
Merge TestKillProcessGroup and TestProcessGroupIsolation into a single
TestProcessGroupCleanup class with a class-level pytestmark for the
win32 skipif. Remove redundant docstrings and unused os import.

Co-authored-by: Ralphify <noreply@ralphify.co>
@malpou malpou marked this pull request as ready for review March 22, 2026 14:34
Copy link
Copy Markdown
Collaborator

@kasperjunge kasperjunge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — ghost processes are a real problem and the process group isolation approach is solid.

However, I'm concerned about the 3-second SIGTERM→SIGKILL behavior as the default Ctrl+C. If the agent is mid-way through writing files or doing a multi-file refactor, a hard kill leaves the working directory in a corrupted state. That's potentially worse than the ghost process.

What I'd like instead is a two-stage Ctrl+C:

  1. First Ctrl+C — Signal the loop to stop after the current iteration finishes naturally. The agent completes its work, no orphaned process, no interrupted writes. Show a clear UI message like "Finishing current iteration… (Ctrl+C again to force kill)".
  2. Second Ctrl+C — Hard kill with the SIGTERM→SIGKILL escalation from this PR. For when the user truly wants to abort immediately and accepts the risk.

The engine already has stop_requested on RunState, so the first Ctrl+C path may mostly exist. The process group cleanup from this PR becomes the escalation path for the second Ctrl+C.

Could you rework the PR with this two-stage approach? The UX messaging is important — the user should clearly understand what each Ctrl+C does.

@kasperjunge
Copy link
Copy Markdown
Collaborator

Superseded by #38, which adds two-stage Ctrl+C (graceful stop first, force kill on second). Thanks @malpou for the process group isolation work — your commits are preserved with authorship in the new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ghost agent processes continue editing files after Ctrl+C

2 participants